Skip to content

add level3 defaults for x86 #5418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

commodo
Copy link
Contributor

@commodo commodo commented Aug 8, 2025

On some x86 configurations, this fails with (see error below)

It seems that there are many x86 configurations supported by OpenBLAS, specific for various CPU family names.
But, if (on some x86 builds) this isn't met, then some parameters become undefined.

Link:
openwrt/packages#27179 (comment)

In file included from ../../common.h:586,
                 from gemm3m.c:40:
gemm3m_level3.c: In function 'cgemm3m_nn':
../../common_param.h:1435:33: error: 'CGEMM3M_DEFAULT_R' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_R'?
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1435:33: note: each undeclared identifier is reported only once for each function it appears in
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1434:33: error: 'CGEMM3M_DEFAULT_Q' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_Q'?
 1434 | #define CGEMM3M_Q               CGEMM3M_DEFAULT_Q
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1661:25: note: in expansion of macro 'CGEMM3M_Q'
 1661 | #define GEMM3M_Q        CGEMM3M_Q
      |                         ^~~~~~~~~
gemm3m_level3.c:313:20: note: in expansion of macro 'GEMM3M_Q'
  313 |       if (min_l >= GEMM3M_Q * 2) {
      |                    ^~~~~~~~
i486-openwrt-linux-musl-gcc -Os -pipe -march=pentium-mmx -fno-caller-saves -fno-plt -fhonour-copts -ffile-prefix-map=/builder/build_dir/target-i386_pentium-mmx_musl/OpenBLAS-0.3.30=OpenBLAS-0.3.30 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro  -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/usr/include -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/include -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/include/fortify   -DMAX_STACK_ALLOC=2048 -DEXPRECISION -m128bit-long-double -Wall -m32 -DF_INTERFACE_GFORT -fPIC -DC_LAPACK -DNO_LAPACK -DNO_LAPACKE -DNO_AVX -DNO_AVX512 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.30\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.  -DMAX_STACK_ALLOC=2048 -DEXPRECISION -m128bit-long-double -Wall -m32 -DF_INTERFACE_GFORT -fPIC -DC_LAPACK -DNO_LAPACK -DNO_LAPACKE -DNO_AVX -DNO_AVX512 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.30\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=cgemm3m_cn -DASMFNAME=cgemm3m_cn_ -DNAME=cgemm3m_cn_ -DCNAME=cgemm3m_cn -DCHAR_NAME=\"cgemm3m_cn_\" -DCHAR_CNAME=\"cgemm3m_cn\" -DNO_AFFINITY -I../.. -UDOUBLE  -DCOMPLEX  -c -UDOUBLE -DCOMPLEX -DCN gemm3m.c -o cgemm3m_cn.o
../../common_param.h:1433:33: error: 'CGEMM3M_DEFAULT_P' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_P'?
 1433 | #define CGEMM3M_P               CGEMM3M_DEFAULT_P
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1651:25: note: in expansion of macro 'CGEMM3M_P'
 1651 | #define GEMM3M_P        CGEMM3M_P
      |                         ^~~~~~~~~
gemm3m_level3.c:325:20: note: in expansion of macro 'GEMM3M_P'
  325 |       if (min_i >= GEMM3M_P * 2) {
      |                    ^~~~~~~~
../../common_param.h:1436:33: error: 'CGEMM3M_DEFAULT_UNROLL_M' undeclared (first use in this function); did you mean 'CGEMM3M_DEFAULT_UNROLL_N'?
 1436 | #define CGEMM3M_UNROLL_M        CGEMM3M_DEFAULT_UNROLL_M
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~
../../common_param.h:1580:25: note: in expansion of macro 'CGEMM3M_UNROLL_M'
 1580 | #define GEMM3M_UNROLL_M CGEMM3M_UNROLL_M
      |                         ^~~~~~~~~~~~~~~~
gemm3m_level3.c:329:33: note: in expansion of macro 'GEMM3M_UNROLL_M'
  329 |           min_i = ((min_i / 2 + GEMM3M_UNROLL_M - 1)/GEMM3M_UNROLL_M) * GEMM3M_UNROLL_M;
      |                                 ^~~~~~~~~~~~~~~
make[4]: *** [Makefile:1865: cgemm3m_nn.o] Error 1
make[4]: *** Waiting for unfinished jobs....
In file included from ../../common.h:586,
                 from gemm3m.c:40:
gemm3m_level3.c: In function 'cgemm3m_cn':
../../common_param.h:1435:33: error: 'CGEMM3M_DEFAULT_R' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_R'?
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1435:33: note: each undeclared identifier is reported only once for each function it appears in
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1434:33: error: 'CGEMM3M_DEFAULT_Q' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_Q'?
 1434 | #define CGEMM3M_Q               CGEMM3M_DEFAULT_Q
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1661:25: note: in expansion of macro 'CGEMM3M_Q'
 1661 | #define GEMM3M_Q        CGEMM3M_Q
      |                         ^~~~~~~~~
gemm3m_level3.c:313:20: note: in expansion of macro 'GEMM3M_Q'
  313 |       if (min_l >= GEMM3M_Q * 2) {
      |                    ^~~~~~~~

On some x86 configurations, this fails with (see error below)

It seems that there are many x86 configurations supported by OpenBLAS,
specific for various CPU family names.
But, if (on some x86 builds) this isn't met, then some parameters become
undefined.

Link:
  openwrt/packages#27179 (comment)

```
In file included from ../../common.h:586,
                 from gemm3m.c:40:
gemm3m_level3.c: In function 'cgemm3m_nn':
../../common_param.h:1435:33: error: 'CGEMM3M_DEFAULT_R' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_R'?
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1435:33: note: each undeclared identifier is reported only once for each function it appears in
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1434:33: error: 'CGEMM3M_DEFAULT_Q' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_Q'?
 1434 | #define CGEMM3M_Q               CGEMM3M_DEFAULT_Q
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1661:25: note: in expansion of macro 'CGEMM3M_Q'
 1661 | #define GEMM3M_Q        CGEMM3M_Q
      |                         ^~~~~~~~~
gemm3m_level3.c:313:20: note: in expansion of macro 'GEMM3M_Q'
  313 |       if (min_l >= GEMM3M_Q * 2) {
      |                    ^~~~~~~~
i486-openwrt-linux-musl-gcc -Os -pipe -march=pentium-mmx -fno-caller-saves -fno-plt -fhonour-copts -ffile-prefix-map=/builder/build_dir/target-i386_pentium-mmx_musl/OpenBLAS-0.3.30=OpenBLAS-0.3.30 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro  -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/usr/include -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/include -I/builder/staging_dir/toolchain-i386_pentium-mmx_gcc-14.3.0_musl/include/fortify   -DMAX_STACK_ALLOC=2048 -DEXPRECISION -m128bit-long-double -Wall -m32 -DF_INTERFACE_GFORT -fPIC -DC_LAPACK -DNO_LAPACK -DNO_LAPACKE -DNO_AVX -DNO_AVX512 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.30\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.  -DMAX_STACK_ALLOC=2048 -DEXPRECISION -m128bit-long-double -Wall -m32 -DF_INTERFACE_GFORT -fPIC -DC_LAPACK -DNO_LAPACK -DNO_LAPACKE -DNO_AVX -DNO_AVX512 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.30\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=cgemm3m_cn -DASMFNAME=cgemm3m_cn_ -DNAME=cgemm3m_cn_ -DCNAME=cgemm3m_cn -DCHAR_NAME=\"cgemm3m_cn_\" -DCHAR_CNAME=\"cgemm3m_cn\" -DNO_AFFINITY -I../.. -UDOUBLE  -DCOMPLEX  -c -UDOUBLE -DCOMPLEX -DCN gemm3m.c -o cgemm3m_cn.o
../../common_param.h:1433:33: error: 'CGEMM3M_DEFAULT_P' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_P'?
 1433 | #define CGEMM3M_P               CGEMM3M_DEFAULT_P
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1651:25: note: in expansion of macro 'CGEMM3M_P'
 1651 | #define GEMM3M_P        CGEMM3M_P
      |                         ^~~~~~~~~
gemm3m_level3.c:325:20: note: in expansion of macro 'GEMM3M_P'
  325 |       if (min_i >= GEMM3M_P * 2) {
      |                    ^~~~~~~~
../../common_param.h:1436:33: error: 'CGEMM3M_DEFAULT_UNROLL_M' undeclared (first use in this function); did you mean 'CGEMM3M_DEFAULT_UNROLL_N'?
 1436 | #define CGEMM3M_UNROLL_M        CGEMM3M_DEFAULT_UNROLL_M
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~
../../common_param.h:1580:25: note: in expansion of macro 'CGEMM3M_UNROLL_M'
 1580 | #define GEMM3M_UNROLL_M CGEMM3M_UNROLL_M
      |                         ^~~~~~~~~~~~~~~~
gemm3m_level3.c:329:33: note: in expansion of macro 'GEMM3M_UNROLL_M'
  329 |           min_i = ((min_i / 2 + GEMM3M_UNROLL_M - 1)/GEMM3M_UNROLL_M) * GEMM3M_UNROLL_M;
      |                                 ^~~~~~~~~~~~~~~
make[4]: *** [Makefile:1865: cgemm3m_nn.o] Error 1
make[4]: *** Waiting for unfinished jobs....
In file included from ../../common.h:586,
                 from gemm3m.c:40:
gemm3m_level3.c: In function 'cgemm3m_cn':
../../common_param.h:1435:33: error: 'CGEMM3M_DEFAULT_R' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_R'?
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1435:33: note: each undeclared identifier is reported only once for each function it appears in
 1435 | #define CGEMM3M_R               CGEMM3M_DEFAULT_R
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1671:25: note: in expansion of macro 'CGEMM3M_R'
 1671 | #define GEMM3M_R        CGEMM3M_R
      |                         ^~~~~~~~~
gemm3m_level3.c:306:37: note: in expansion of macro 'GEMM3M_R'
  306 |   for(js = n_from; js < n_to; js += GEMM3M_R){
      |                                     ^~~~~~~~
../../common_param.h:1434:33: error: 'CGEMM3M_DEFAULT_Q' undeclared (first use in this function); did you mean 'CGEMM_DEFAULT_Q'?
 1434 | #define CGEMM3M_Q               CGEMM3M_DEFAULT_Q
      |                                 ^~~~~~~~~~~~~~~~~
../../common_param.h:1661:25: note: in expansion of macro 'CGEMM3M_Q'
 1661 | #define GEMM3M_Q        CGEMM3M_Q
      |                         ^~~~~~~~~
gemm3m_level3.c:313:20: note: in expansion of macro 'GEMM3M_Q'
  313 |       if (min_l >= GEMM3M_Q * 2) {
      |                    ^~~~~~~~
```
@commodo
Copy link
Contributor Author

commodo commented Aug 8, 2025

I'll admit here that the defaults may not be correct, so that is up for debathe.

This fixes the build for OpenWrt, x86 builds.

@martin-frbg
Copy link
Collaborator

How are you building OpenBLAS ? What you saw is what happens when no TARGET option was specified, and autodetection of the cpu model by the getarch helper program failed.

commodo added a commit to commodo/packages that referenced this pull request Aug 8, 2025
This change updates the version to 0.3.30
Added patch to fix x86 builds.

Also sent upstream:
   OpenMathLib/OpenBLAS#5418

Signed-off-by: Alexandru Ardelean <[email protected]>
@martin-frbg
Copy link
Collaborator

Do you know which configurations (if any) are failing besides Pentium ? According to kernel/x86/KERNEL.P5 (which includes/redirects to KERNEL.P6), GEMM3M is using 2x2 kernels, which should be reflected in the M and N (where your PR currently has 8 and 4).
(I did touch some GEMM3M-related code in 0.3.29 (#4976) but that should only have affected the "generic" TARGET - which had a bogus "print parameters and die" implementation of GEMM3M before 0.3.28. Curious how a problem with the venerable P5/P6 target from original GotoBLAS would come up only 15+ years later)

@commodo
Copy link
Contributor Author

commodo commented Aug 11, 2025

How are you building OpenBLAS ? What you saw is what happens when no TARGET option was specified, and autodetection of the cpu model by the getarch helper program failed.

Apologies for the slow reply.
Maybe this became obvious, but this is being built as part of the OpenWrt distro.
The x86 target is affected.
x86_64 looks fine

@commodo
Copy link
Contributor Author

commodo commented Aug 11, 2025

Do you know which configurations (if any) are failing besides Pentium ? According to kernel/x86/KERNEL.P5 (which includes/redirects to KERNEL.P6), GEMM3M is using 2x2 kernels, which should be reflected in the M and N (where your PR currently has 8 and 4). (I did touch some GEMM3M-related code in 0.3.29 (#4976) but that should only have affected the "generic" TARGET - which had a bogus "print parameters and die" implementation of GEMM3M before 0.3.28. Curious how a problem with the venerable P5/P6 target from original GotoBLAS would come up only 15+ years later)

Short answer is: no. I don't know which other configurations beside Pentium are failing.
The Github PR has a set of targets in the CI; so the ones currently passing are the only ones so far.

Once the PR is merged, then it goes into a more completel CI, and will appear after 1-2 days here under one of those build targets.
I know, it's not an idea scenario.

Curious how a problem with the venerable P5/P6 target from original GotoBLAS would come up only 15+ years later)

My understanding is that there is no #define providing any GEMM3M definitions for P5/P6. I haven't looked too deep into this one.

commodo added a commit to openwrt/packages that referenced this pull request Aug 13, 2025
This change updates the version to 0.3.30
Added patch to fix x86 builds.

Also sent upstream:
   OpenMathLib/OpenBLAS#5418

Signed-off-by: Alexandru Ardelean <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants